For Rachel, snow_depth data 'cleaning':

QRRC Snow Depth Climatology Notebook

Snow Depth Temperature climatology analysis

Hadleigh Thompson
Northern Hydrometerology Group
UNBC

Feb 2017
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

import plotly.plotly as py
import cufflinks as cf
import pandas as pd

cf.set_config_file(offline=True, theme='ggplot')

plt.style.use('ggplot')
% matplotlib inline
In [2]:
columns = ['Snow_Depth']
snow_profile = pd.DataFrame(columns=columns)
snow_profile.head()
Out[2]:
Snow_Depth

First, we read in each of files we have for QRRC, from 2006 to 2016:

(This is all the raw data, erroneous values included)

In [3]:
for file in [f for f in os.listdir('../Data/QRRC/') if f.endswith('.xlsx')]:
    
    file_path = os.path.splitext(os.path.basename(file))
    filename = file_path[0]
    
    # read and convert Timestamp to pd DateTime object
    df = pd.read_excel('../Data/QRRC/{}' .format(file), header=0, skiprows=(1,2), engine='xlrd')
    df.Timestamp = pd.to_datetime(df.Timestamp, infer_datetime_format=True)
    df.set_index(['Timestamp'], inplace=True, drop=True)
    
    # ensure the series is of the correct type to perform the required operations
    series_snowdepth = df.DBTCDT.astype(float)
    #series_snowdepth = series_snowdepth[(series_snowdepth >= 0.) & (series_snowdepth < 221.)]
    
    # empty df to conncat to existing df to create long timeseries
    snow_profile2 = pd.DataFrame()
    snow_profile2['Snow_Depth'] = series_snowdepth
    
    snow_profile2['Snow_Depth'][(snow_profile2.index.month >= 5) & (snow_profile2.index.month <= 9)] = 0.0
    
    snow_profile = pd.concat([snow_profile, snow_profile2], axis=0)
    
    print('Added {}' .format(filename))
    
snow_profile.sort_index(inplace=True)
Added QRRC_2006
Added QRRC_2007
Added QRRC_2008
Added QRRC_2009
Added QRRC_2010
Added QRRC_2011
Added QRRC_2012
Added QRRC_2013
Added QRRC_2014
Added QRRC_2015
Added QRRC_2016

Then a quick look plot tells us that we need to aviod negative values and we also see the 222cm spikes (installed sensor height):

In [4]:
snow_profile.iplot(kind='line', filename='cf-simple-line', color = 'rgb(100, 100, 235)',
                     title='Quick-look Plotly Trace 1')

So this line of code gets rid of negative values and anything above 221cm, since the sensor height is 222cm, and we can see that we never get enough snow in any singular season to worry if these 2m+ snow depths are real or not:

In [5]:
snow_profile[(snow_profile['Snow_Depth'] < 0.) | (snow_profile['Snow_Depth'] > 221.)] = np.nan
In [6]:
snow_profile.iplot(kind='line', filename='cf-simple-line', color = 'rgb(100, 100, 235)',
                     title='Quick-look Plotly Trace 2')

Then to ensure summer values are 0, I have gone from May 1st, to the November 1st. Let me kow if you want this to change:

In [7]:
snow_profile['Snow_Depth'][(snow_profile.index.month >= 5) & (snow_profile.index.month <= 10)] = 0.0
In [8]:
snow_profile.iplot(kind='line', filename='cf-simple-line', color='rgb(100,100,235)',
                     title='Quick-look Plotly Trace 3')

There's still some funky things going on in November some years, it wont be too hard to tidy that up at a later date just by running the code again with a few extra lines to exclude those particular dates.

This is now what the data in the 'Master Climatology' excel file will look like, but since you may have to explain what happened to the anolmalies, I though it was easier for you to see them for yourself.

In [9]:
snow_profile.to_excel('../Data/Climatology_data/QRRC_Snow_Depth_Climatology.xlsx', sheet_name='Data', na_rep='Nan')